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Abstract 

Hard-threshold estimators are popular in signal processing applications. We provide a detailed study 
of using hard-threshold estimators for estimating an unknown deterministic signal when additive white 
Gaussian noise corrupts observations. The analysis, depending heavily on Cramer-Rao bounds, motivates 
piecewise-linear estimation as a simple improvement to hard thresholding. We compare the performance 
of two piecewise-linear estimators to a hard-threshold estimator. When either piecewise-linear estimator 
is optimized for the decay rate of the basis coefficients, its performance is better than the best possible 
with hard thresholding. 
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I. Introduction 

Removing noise from signals ("denoising") is a problem central to many engineering disciplines. As 
summarized nicely by Moulin and Liu [1], most methods fall into at least one of three categories: Bayesian 
techniques, which assume a probabilistic prior for the unknown signal and minimize an error measure 
given the observations; minimax techniques, which are designed for good worst-case performance over 
some broad class of signals; and techniques based on the minimum description length (MDL) principle. 
In the electrical engineering literature, the Bayesian and MDL approaches are more common than the 
minimax approach familiar to statisticians. 

In many fields — especially image processing and geophysics — the use of the wavelet domain is promi- 
nent, regardless of which of the three approaches is undertaken. For Bayesian techniques, the wavelet 
domain is convenient because it allows low-complexity diagonal [2]-[4] or nearly-diagonal [5] estimators 
to be used with little loss in performance. Minimax estimation performance is intimately connected 
with nonlinear approximation [6], so the approximation power of wavelet bases for many classes of 
signals make them appropriate [7]-[9]. Finally, the success of wavelet-based compression makes wavelet 
representations suitable for generation of regularization terms in MDL [10], [11]. All three approaches, 
under appropriate conditions, justify simple hard threshold or soft threshold (shrinkage) estimators. 

In this paper we study the classical problem of estimating a signal in the presence of additive white 
Gaussian noise (AWGN) with the goal of minimizing mean-squared error (MSE). Rather than applying the 
Bayesian formulation, we cast this as the estimation of a non-random parameter vector. This allows us to 
explain the performance of hard threshold estimators through bias-variance trade-off and the Cramer-Rao 
bound (CRB). The (biased) CRB provides more insight than the standard "oracle" bound of [12, Ch. 10]. 
Shaping the bias and resulting MSE inspires the analysis and optimization of two alternatives to hard 
thresholding. Both are piecewise-linear functions, and one has been proposed previously as the "semisoft 
shrinkage" estimator [13]. Our focus is not on the invention of simple estimators, but rather on the fact 
that the performance of such estimators can be better understood with the analysis presented herein. 
Furthermore, we show that the degrees of freedom in these estimators can be optimized, given the decay 
rate of coefficients, to achieve lower average estimation error than that incurred by hard thresholding. 

The paper is organized as follows. In Section |ll] we review the definition of bias and estimation error 
bounds for both unbiased and biased estimators. We then analyze hard-threshold estimators — notably 
explaining their performance using Cramer-Rao Bounds (CRBs) — in Section |llll Inspired by this way of 
understanding hard-threshold estimators, we analyze two alternative estimators in SectionUV] In particular. 



DRAFT 



February 2, 2008 



3 

we show that these estimators can be optimized for the decay rate of the unknown deterministic parameter 
vector, resulting in uniform improvement over hard thresholding. We also discuss the limiting cases that 
relate the alternative estimators to hard-threshold estimators. Finally, Section |V] provides a discussion of 
the key results that emerge from our analysis. 



II. Background on Estimation Error Bounds 

A. Estimation in White Gaussian Noise 

In this paper we consider non-random signal estimation when the observation is the signal plus white 
Gaussian noise. In particular, assume that the observed signal is expanded in some basis of our choice, 
and the N G Z+ basis coefficients of interest are stacked in a x 1 vector y G R^, such that 

y = x + w, (1) 

where x G and w G are x 1 column vectors representing the corresponding signal and 
noise basis coefficients respectively, when expanded in the same basiso Here, x is deterministic yet 
unknown, and w is a. random, zero-mean {E[w] = 0) white Gaussian noise vector with correlation 
matrix E[ww'^] = ct^/at, where In is the N x N identity matrix. Thus y is a Gaussian random vector 
with mean x and covariance matrix a'^I^', i-fi- the probability density function of y is 

p{y;x)r^M{x,allN) = (^^^2 )7V/2 (2) 

Before embarking on our analysis let us review and establish some notation. An estimator for x, denoted 
by x(y), is a deterministic function of y that maps an observation vector in M.^ into the parameter space 
O Given an estimator, we define the error as 



r>Af 



e{y) = x{y) - x, (3) 

which is a A^ X 1 random vector. Its mean value is termed the bias of the estimator and is denoted with 

b{x) = E[xiy)] - X. (4) 

Note that the bias of an estimator is in general a function of the parameters that are being estimated (i.e. 
x), so in general it is not trivial to arbitrarily modify or eliminate the bias of an estimator. 

'Throughout this paper we are going to assume TV is finite, although it can be arbitrarily large. 
^The observation space and parameter space need not be identical in general. 
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The performance of an estimator is often assessed by its N x N, positive-definite error correlation 
matrix 

A,{x) = E[e{y)e^{y)]. (5) 

Note that the £2 -norm of the error vector can be obtained from the error correlation matrix by taking its 
trace; i.e. 

mse(a;) = E [\\x{y) - a^f] = tr (A^) . (6) 

B. Unbiased Estimators 

An estimator is unbiased if it satisfies b{x) = for all x. It is well known in estimation theory [14, 
Ch. 2] that the Cramer-Rao bound yields a global performance bound for all unbiased estimators as 

Ae-IyHx)>Q, (7) 

where '>' indicates that the matrix on the left hand side is positive semi-definite. Here Iy{x) is the 
N X N Fisher Information matrix with elements 



[Iy{x)]n,m = -E qJI^ hip{y]x 



(8) 



for n,m = 1,...,N. The Cramer-Rao bound is always a lower bound (in the positive semi-definite 
sense), however it may not be possible to satisfy it with equality. In particular, the left hand side of (|7]) 
is equal to if and only if the efficient estimator 

x{y)=x + Iy^{x){y^\np{y;x)y , (9) 

where = [d/dxi, . . . , d/dx^], exists; i.e. the right hand-side of ^ must be independent of x. This 
is indeed the case for the AWGN problem studied in this paper. In particular, we find that 

I-\x)=allN, (10) 

and Ae = Iy^{x) is satisfied when the maximum-likelihood estimator is used, i.e. when 

x{y) = y. (11) 

Equations ([TOl l and (fTTI ) imply that the ^2-norm of the error for any unbiased estimator is no less than 
Na'^ and this minimum is achieved by the maximum-likelihood estimator, which is not only a linear 
estimator, but also the trivial identity function. Furthermore, when the maximum-likelihood estimator is 
employed, the estimation errors for each element in x are statistically independent. 
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For unbiased estimators, the CRB on the error covariance matrix is independent of the basis represen- 
tation, so in particular a decomposition into a wavelet basis has no advantage over any other basis [14, 
Ch. 2]. This picture changes significantly however, when we turn our attention to biased estimators. 

C. Biased Estimators 

The set of all estimators that satisfy b{x) ^ for some x constitute the class of biased estimators. 
The Cramer-Rao bound for biased estimators is given by [14, Ch. 2] 



where '>', once again, denotes the positive semi-definiteness of the matrix on the left hand side. 
Unfortunately, the Cramer-Rao bound in the biased case is less useful than it is in the unbiased case. In 
particular, ([T2l ) is interpreted as a lower bound for all estimators with bias h{x), but it is not guaranteed 
that multiple estimators can satisfy a given bias function b{x) (unless b{x) is trivially a constant). 
Furthermore, in general the bound cannot be satisfied with equality. However, as we shall see shortly, 
the Cramer-Rao bound can provide valuable insight into the performance of particular estimators. 

It is worth emphasizing that the Cramer-Rao bound in ([T2l ) depends on only three quantities: The Fisher 
Information matrix, the bias, and the gradient of the bias in the parameter space. The Fisher Information 
matrix is a function of the probability density function of the observed data and therefore is fixed for 
a given problem setup; for example the Fisher Information matrix for the AWGN problem analyzed in 
this paper is a~'^Ij\[. Thus the Cramer-Rao bound can be thought as a function of only the latter two 
quantities. 

Because there is no global lower bound on the performance of biased estimators, a general treatment 
is not possible. Hence, the utility of the biased Cramer-Rao bound is best demonstrated via an example. 
Due to its popularity in wavelet-based estimation techniques, we shall first focus our attention on hard 
thresholding estimators. 



A hard thresholding estimator acts on each basis coefficient observation {yn}n=i independently and 
estimates the true value of each signal basis coefficient according to 



Ae - b{x)b'^{x) 




(12) 



III. Analysis of Hard Thresholding 







if \yn\ < T- 



(13) 



if \yn\ > T, 
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Fig. 1. The input/output relation of a hard thresholding estimator. 
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(a) Bias (b) Mean-square error 



Fig. 2. The bias and mean-square error of a hard thresholding estimator as a function of the true value of the parameter. The 
mean-square error is normalized to the variance of the noise, whereas the bias and the parameter values are normalized to the 
standard deviation of the noise, such that all axis variables are dimensionless. 

where T > is called the threshold, as shown in Figure [T] Note that, because the estimator acts on each 
coefficient separately and because the noise on each coefficient is independent in the AWGN estimation 
problem, we can restrict our analysis to the scalar estimation case with no loss of generality. 

The hard thresholding estimator differs from the maximum-Ukelihood estimator in ([TTI ) only when the 
observation y„ has absolute value less than the threshold T . In this case the hard thresholding estimator 
estimates the true value of the underlying signal as 0, and this squelching action introduces a bias given 
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b'^^'\xn) = - j ^ypw{y-xn) dy, (14) 



where Pw{y) = exp(—y^/2(T^)/ 1^27^0^. This bias is plotted in Figure 2(a) for various threshold values. 
Note that the bias is anti-symmetric; i.e. = — 

The mean-square error is found to be 



rT 

y"^ Pw{y - Xn)dy. (15) 

T 

It is worth identifying the terms contributing to this expression. The first term on the right hand side of 
([T5] ) is the mean-square error obtained when a maximum-likelihood estimator is used. The middle term 
is always positive and therefore always increases the mean-square error above the minimum mean-square 
error achievable with an unbiased estimator. On the other hand, the rightmost term is negative, hence 
it reduces the mean-square error. Therefore, the overall performance of a hard thresholding estimator, 
relative to the maximum-likelihood estimator, is determined by which of the latter two terms has greater 
magnitude. Figure |2(b)| plots the mean-square error normalized to the variance of the noise, against 
the true value of the parameter x„ normalized to the standard-deviation of the noise. The figure shows 
that the performance of a hard thresholding estimator depends on the value of x„. Recalling that the 
maximum-likelihood estimator has mean-square error equal to cj^ independent of x„, we observe from 
the plots that when x„ is within approximately one standard-deviation of 0, the mean-square error of the 
hard thresholding estimator is less than that of the maximum-hkelihood estimator. On the other hand, if 
Xn ^ T, then the mean-square error approaches that of the maximum-likelihood estimator, because the 
probability that the noise will push the observation into the regime of thresholding becomes very small. 
However, in the intermediate regime of Xn, when the true value of the signal is on the same order as 
the threshold, the mean-square error of the thresholding estimator is worse than the maximum-likelihood 
estimator, because the noise can push the observation to either side of the threshold, leading to significant 
errors in the estimate. 

We can utilize the Cramer-Rao bound for biased estimators to gain further insight into hard thresholding. 
Figure [3] compares the normalized mean-square error of a hard thresholding estimator to the unbiased 
and biased Cramer-Rao bounds, as well as the "optimal oracle" lower bound obtained in [12, Ch. 10] 

^Because the explicit expressions of the bias and mean-square error are cumbersome, we shall defer them to Appendix lAl and 
state here instead concise integral expressions for these quantities. 
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Fig. 3. The mean-square error of a hard thresholding estimator with T — 2an,, together with the unbiased Cramer-Rao bound 
(CRB,U), the biased Cramer-Rao bound (CRB,B) for all estimators with bias given by 6*'*(a;„), and the "optimal oracle" bound 
(OR) derived in [12, Ch. 10] (see Appendix iBt. Because the hard thresholding estimator is biased, its mean-square error can 
go lower than the unbiased Cramer-Rao bound. 



and reproduced in Appendix |B] for convenience. Notice from the figure that the oracle bound is a very 
weak lower bound which does not capture the oscillatory behavior of the hard thresholding mean-square 
error. On the other hand, the Cramer-Rao bound — a lower bound for all estimators with bias given by 
([T4l) — follows the same oscillatory trend as the hard thresholding mean-square error. In the scalar AWGN 
case, the biased Cramer-Rao bound simplifies to 

2 



mse„(a;„) > 6^(x„) + 0-^1 + gf-6„(a;„)j , (16) 

which depends only on the bias and its derivative with respect to Xn- Hence, the oscillatory behavior of 
the hard thresholding mean-square error is primarily a consequence of its bias. From Figure |2(a) 



we can 



verify that the improvement in mean-square error for Xn ~ 0, is due to dU^'-\xn) / dxn < 0, whereas 
both \bn{xn)\ ^ and > contribute to the peak in the mean-square error. 

We can further exploit the bias dependence of the mean-square error to improve the performance of 
the estimator on a sequence of coefficients with a given decay rate. We develop this in the next section. 

IV. Alternatives to Hard Thresholding 

As evidenced in hard thresholding estimators, the bias plays a significant role in determining the mean- 
square error behavior of an estimator. This dependence has been exploited in previous work to improve 
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(a) Piecewise-linear (b) Semisoft shrinkage 

Fig. 4. The input/output relations for two alternative piecewise-linear estimators. 



the mean-square error performance — with respect to the unbiased Cramer-Rao bound — over the entire 
parameter space [15]. Here, we shall instead aim at improving the average error performance of the hard 
thresholding estimator when applied to multiple basis coefficients with known decay rate. 

In this section we consider two piecewise-linear estimators that generalize the hard thresholding 
estimator. The first is given by 



ayn, if \yn\ < T 

(17) 



Vn, if \Vn\ > T 

where a G [0, 1] and n = I,... ,N. From the input/output relation of the piecewise-linear estimator 
shown in Figure |4(a) it is clear that a = corresponds to hard thresholding, whereas a = 1 yields the 



maximum-likelihood estimator. Thus, the slope of the line segment over y„ G [— T, T] is a degree of 
freedom in the piecewise-linear estimator that encompasses both the maximum-likelihood estimator and 
the hard thresholding estimator as special instances. 
The bias of this estimator is given by 

b^f{Xn) = il-a)b^^'\xn), (18) 

and because < (1 — a) < 1, the bias and its derivative have smaller magnitude in comparison to hard 



thresholding, as can be verified from Figure 5(a) The mean-square error is given by 



mse^fixn) = ctI-{1 - a)2xnb^^'\xn) 



rT 

[l-Q^)\ y^Pwiy - Xn)dy, (19) 

J-T 
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and is plotted together with the mean-square error from hard thresholding in Figure 5(b) It is clear from 
the plot that the piecewise-linear estimator has better worst-case mean-square error than hard thresholding, 
but this comes at the price of worse best-case mean-square error. 

The second estimator we consider here is the "semisoft shrinkage" estimator [13] which replaces the 
discontinuities in the hard thresholding estimator with a linear segment connecting the left and right limit 
points, i.e., 

0, if \yn\ < To; 

PiVn - sgn(y„)To) if Tq < \y\ < T; (20) 

Vn, if \yn\ > T, 



where (3 = T/{T — Tq) > 1 denotes the slope of the line segment shown in Figure |4(b)[ and 



sgn(y) 



1, 


y 


> 


0, 


y 


= 


-1, 


y 


< 



(21) 



is the signum function. Note that the shrinkage estimator reduces to hard thresholding when T = Tq, but 
it does not otherwise encompass the piecewise-linear estimator in (ITtI ). 

The bias and mean-square error expressions for the shrinkage estimator are less tractable than the 
previous case. Nonetheless, its bias can be expressed as 



(22) 



■JTo 

{y + Xn)) dy, 



and is plotted in Figure |5(a)| together with those obtained from the estimators introduced thus far. The 
mean-square error of this shrinkage estimator takes on the form 



mse^''\x„) = mse5f'^(x„) + /(x„) + 



(23) 



where 



f{x)= / {(3\y-Tof-2x(3{y-To))pUy-x)dy. 



(24) 



This mean-square error is compared to that of the previous estimators in Figure |5(b)| It is seen that the 
shrinkage estimator has similar error to that obtained from ( [TT] ). but the peak of the oscillation is slightly 
skewed. 
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Fig. 5. The bias and mean-square error of the two piecewise-linear estimators (PL, SS) and the hard thresholding (HT) estimator 
are plotted as a function of the true value of the signal coefficient. All axis variables are normalized to be dimensionless. 
T — 2(Tii, for all of the plots, a = 0.5 is the slope of the piecewise-linear estimator, and To — 0.5T for the shrinkage estimator. 



To demonstrate that piecewise-linear estimation improves performance over hard thresholding, we 
return to the vector-valued AWGN estimation problem and compare the mean-square error per symbol 
obtained with a hard thresholding estimator and with the two piecewise-linear estimators introduced above. 
We assume that there are N basis coefficients to be estimated from the same number of observations. 
Furthermore, because both estimators have symmetric mean-square error as a function of the true value 
of the parameter, we shall assume with no loss of generality that a;„ > for all n = 1, . . . , A^. Finally, we 
assume that the true values of the coefficients — when sorted — have a decay rate governed by a generalized 
Gaussian function, i.e. we assume the coefficient sequence is given by 

xn = K(p)e-W"-i)l'' (25) 

for n = 1 . . . , A^, where p > is the decay rate, k{p) > is a scaling factor such that the energy of the 
sequence is equal for all values of p, and the closest integer to 1/A is approximately the e^^ attenuation 
point of the coefficients. The signal-to-noise ratio of such a sequence is defined as the ratio of the total 
signal energy to the total noise energy, i.e. 

y^^ \x P 

SNR = ^"=1 ' . (26) 
Recall that the estimators act on each coefficient separately and the noise on each coefficient is 
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independent. Therefore, for both estimators, the total mean-square error is equal to the sum of the 
mean-square error from each coefficient, i.e. 



where method m is ht for hard thresholding, pi for piecewise linear, or ss for semisoft shrinkage. 
Furthermore, the average mean-square error per symbol is defined as 



Figure |6] plots the average mean-square error obtained from the three estimators as a function of the 
decay rate of coefficients when N = 101, A = 0.04 and all the degrees of freedom for the estimators are 
optimized; i.e. the optimization is carried out over T in (O, a and T in and Tq and T in 

The plot shows that the average mean-square error improves for all values of the decay rate when 
either of the piecewise-linear estimators is utilized in place of the hard thresholding estimator. Let us 
consider the limiting cases. The histogram for p = 75 shows that for fast decay rates the coefficients are 
clustered into two groups: a significant number of the coefficients are very close to 0, while the remaining 
coefficients are grouped at a large value determined by the SNR of the signal. Consequently, very few 
coefficients are in the intermediate region. Such a histogram is ideal for hard thresholding because the 
optimum threshold aligns the region with significant mean-square error in the gap between the two sets of 
coefficients. Thus, the error incurred per large coefficient becomes (the maximum-likelihood estimator 
limit), whereas coefficients that are approximately yield error that is a fraction of cr^. In this regime, the 
optimal slope a in dTTl ) approaches 0, while the optimal threshold equals that of hard thresholding. Hence 
the mean-square error from the two estimators converge for p S> 1. On the other hand, the smoother 
transition at the thresholding boundaries in the semisoft shrinkage estimator reduces the error incurred 
from the few coefficients that fall within the intermediate region, without significant impact on the errors 
incurred from the two main clusters of coefficients. Hence, for p » 1 the shrinkage estimator slightly 
outperforms the other two estimators. 

In the opposite limiting case, the slow decay rate implies that the coefficients will be more spread out 
over the parameter space, as evidenced by the histogram for p = 1. Consequently, the error incurred from 
the coefficients with intermediate values becomes prohibitively large when either the hard thresholding or 

''ah coefficient sequences (each with different p) were normalized to the same energy (to attain identical SNR in all sequences), 
where the normalization constant was chosen such that the largest coefficient over all p was IOctu,. Then, the optimization was 
performed numerically (using the analytical expressions given in Appendix [At. given the constraints a G [0, 1] and T > To > 0. 



N 




(27) 



n=l 




(28) 
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Fig. 6. Mean-square error per symbol of the three estimators analyzed herein. The dash-dotted (blue) curve is the mean-square 
error of hard thresholding estimators, the solid (red) curve is that of piecewise-linear estimators of form ( I17t . and the dashed 
(green) curve is the mean-square error of semisoft shrinkage estimators. Histograms refer to signal coefficients with decay rate 
p = 1 and p = 75 respectively. Optimization over threshold values and slope is performed numerically from analytic expressions 
(see footnote 4 for details). Parameter values are iV = 101, A = 0.04 and SNR=10.7dB. 



the shrinkage estimator is utilized. Hence, the optimal threshold parameters for both of these estimators, 
when p <C 1, leads to the maximum-likelihood estimator (i.e. T = for hard thresholding and Tq = 
for shrinkage). On the other hand, the optimal values for the piecewise-linear estimator in ([TT]) turns out 
to be a large threshold value combined with a slope slightly smaller than unity. Therefore, an estimator 
that is linear over a significant portion of the coefficients, but with a more conservative slope than the 
maximum-likelihood estimator, reduces the average mean-square error in comparison to the that obtained 
from the maximum-likelihood estimator. This result is not entirely unexpected, as it has been shown 
previously that a (biased) linear estimator with slope less than unity performs better than the maximum- 
likelihood estimator over the entire parameter space [15]. In the case considered herein, the slow decay 
rate implies that the coefficients will be spread out over the parameter space, thus the estimator that 
minimizes the average mean-square error per coefficient must perform well over a large subspace of the 
parameter space, and this is consistent with the optimization criterion considered in [15]. 

The optimal values of the degrees of freedom for all three estimators are plotted in Figure |7] as a 
function of the decay rate p. 
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Decay rate, p 



(a) Slope 
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/■ ■ ■ : SNR = 10.7dB 



10"^ 10° 10^ 10^ 

Decay rate, p 

(b) Thresholds 

Fig. 7. The optimum slope of the piecewise-linear estimator and the optimal threshold values of all estimators are plotted as 
a function of the decay rate of the signal coefficients. Values are determined numerically from analytical expressions given in 
Appendix |A| 'HT' denotes the hard thresholding estimator, 'PL' is the piecewise-linear estimator in l ll7t . and 'SS' denotes the 
semisoft shrinkage estimator. Parameter values are A*' — 101, A = 0.04 and SNR=10.7dB. 
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V. Discussion 

In this paper we have provided an estimation theoretic study of non-random signal estimation in order 
to deepen our understanding of some of the common results encountered in wavelet-based estimation 
techniques. We focused on the problem of estimating the basis expansion coefficients of a signal when 
the coefficients are corrupted with additive white Gaussian noise. 

The main results developed in this paper can be summarized as follows. The mean-square error lower 
bound that applies to unbiased estimators indicates that linear estimation is optimal, and furthermore, 
the basis choice for decomposing the signal does not affect optimal performance. This is, of course, an 
expected result. Optimal processing of Gaussian random vectors is always linear. Furthermore, if one 
obtains an optimal solution in one basis, a non-singular transformation of the coordinate-system does not 
affect the minimum achievable mean-square error, because the optimal estimator in the new basis will 
simply invert the transformation and apply the former estimator to achieve the same minimum. 

The results for biased estimators however differ notably from those for unbiased estimators. Our 
analysis of hard thresholding as an example of a biased estimator shows that biased estimators are not 
constrained by the unbiased version of the Cramer-Rao bound. Furthermore, the extension of this bound 
for biased estimators does not yield achievable lower bounds on performance. Hence, optimality arguments 
for biased estimators are inevitably more heuristic. Our analysis for hard thresholding demonstrated 
that basis representation indeed does affect performance when such an estimator is used, as the basis 
coefficients must be well separated into values that are very close to zero (approximately within one 
standard-deviation of the noise) and values that are large (significantly larger than the noise standard 
deviation and the threshold value) to obtain mean-square error smaller than the error of a maximum- 
likelihood estimator. In other words, the decay rate of the sorted basis coefficients must be fast. Therefore, 
for the class of signals that have fast-decaying wavelet coefficients, wavelet-basis decompositions in 
conjunction with hard thresholding will be effective in denoising the observed signal. 

Nevertheless, the Cramer-Rao bound for biased estimators provides additional information on how the 
bias of an estimator affects mean-square error performance. In particular, through our analysis of this 
bound for the hard thresholding estimator we motivated piecewise-hnear estimators and subsequently 
demonstrated that they achieve smaller average mean-square error when the decay rate of the coefficients 
are governed by a generalized Gaussian distribution. 

In summary, although prior literature provides abundant analysis on the reduction of mean-square error 
by utilizing wavelet basis expansions and thresholding estimators, additional insight can be obtained by 
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connecting the recent advances in wavelet-based techniques with more traditional estimation theoretic 
analysis. In this paper we have provided such a connection through non-random signal estimation theory 
for the AWGN problem, and we have used our analysis to demonstrate that piecewise-linear estimators 
improve the average mean-square error attained via hard thresholding. 

Appendix A 

Analytical Expressions for Bias and Mean-Square Error in Hard Thresholding 
Let us define 

XS = + T)/{V2a^), (29) 
XD = {xn-T)/iV2a^). (30) 

Evaluating the bias for the hard thresholding estimator from (fT4l) gives 

6^ (^n)/^«, = (2vr)-i/2 (e-^- - e"^^-) 

-ixn/crw){Q{xD) - Q{xs)), (31) 

where 

/•oo 

Q{x) = 7r-^/2 / g-t^^^_ 

J X 

The mean-square error, on the other hand, is obtained from ([T5] ) as 



(32) 



mse; 



(x„)/fT^ = 1 + {Xn/cFuif'[Q{xD) " Q{xs)) 



+ 2' 



^ (sgn(xD)rinc [xl, 3/2) -sgn(x5)r,ne 3/2) ) , (33) 

where sgn(3;) is defined in and 

ri„e(x,3/2) = 2/V^ / (34) 
Jo 

is the Gamma distribution of order 3/2. 

To provide analytical expressions for the shrinkage estimator we must define two new dimensionless 
variables, 

is = {xn + To)/{V2a^), (35) 
= (x,-ro)/(V2a^). (36) 
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Now the bias is given by 

- (3To {Q{xn) -Q{^d)+ Q{xs) - Qi^s)) , (37) 

where the second argument in 6*5^'' indicates the threshold value. The mean-square error expression (|23] l 
depends on (l33l ) and the function 

(^xd{1 - 2To/T)e-"- - {Cs-V2Tox/{a^T))e-^i'^ 

+ (l - 2CD{Cs-V2Tox/{a^T))^ {Q{xD)-QiCD)) ■ (38) 



Appendix B 
"Optimal Oracle" Bound 

We simply reproduce the derivation in [12, Ch. 10]. Consider a scalar estimator of the form 

xiy) = ay, (39) 
where a G M is deterministic, for the scalar AWGN estimation problem. Then 

E[{x{y) - xf] = a^al + (1 - afx^. (40) 

Differentiating this expression with respect to a and setting it to 0, we find the value that minimizes the 
mean-square error as 

(41) 



and the minimum mean-square error is 



y2 2 



mmS[(ay-x)2] = -fH^. (42) 



Equation (l42l ) is the oracle bound used in [12, Ch. 10] and plotted in Figure [3] Because Oopt depends on 
x (which is unknown), such an estimator is not feasible. Hence this mean-square error is not achievable 
by any feasible estimator of form given in ( |39l ). 
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